Omix_vignette_CV_MTG.RmdThe goal of Omix is to provide tools in R to build a complete analysis workflow for integrative analysis of data generated from multi-omics platform.
Generate a multi-omics object using
MultiAssayExperiment.
Quality control of single-omics data.
Formatting, normalisation, denoising of single-omics data.
Separate single-omics analyses.
Integration of multi-omics data for combined analysis.
Publication quality plots and interactive analysis reports based of shinyApp.
Currently, Omix supports the integration of bulk transcriptomics and bulk proteomics.
The Omix pipeline requires the following input:
rawdata_rna: A data-frame containing raw RNA count
with rownames as gene and colnames as samples.
rawdata_protein: A data-frame containing protein
abundance with rownames as proteins and colnames as samples.
map_rna: A data-frame of two columns named
primary and colname where primary should
contain unique sample name with a link to sample metadata and colname is
the column names of the rawdata_rna data-frame.
map_protein: A data-frame of two columns named
primary and colname where primary should
contain unique sample name with a link to sample metadata and colname is
the column names of the rawdata_protein
data-frame.
metadata_rna: A data-frame containing
rna assay specific metadata where rownames are same as the
colnames of rawdata_rna data.frame.
metadata_protein: A data-frame containing
protein assay specific metadata where rownames are same as
the colnames of rawdata_protein data.frame.
individual_metadata: A data-frame containing
individual level metadata for both omics assays.
Call required packages for vignette:
library(Omix)##
## Registered S3 method overwritten by 'ggtree':
## method from
## identify.gg ggfun
##
##
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:Biobase':
##
## combine
## The following objects are masked from 'package:BiocGenerics':
##
## combine, intersect, setdiff, union
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
First we must load the data from Omix for the vignette:
outputDir <- tempdir()
ctd_fp <-file.path(outputDir, "ctd.rds")
ensembl_fp <- file.path(outputDir, "ensembl.rds")
GO_fp <- file.path(outputDir, "GO_Biological_Process_2021.txt")
TF_fp <- file.path(outputDir, "Enrichr_Queries.gmt.txt")
download.file(url = "https://raw.githubusercontent.com/eleonore-schneeg/OmixData/main/GO_Biological_Process_2021.txt",destfile=GO_fp ,headers = c(Authorization = paste("token", "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")))
download.file(url = "https://raw.githubusercontent.com/eleonore-schneeg/OmixData/main/ctd-2.rds",destfile=ctd_fp ,headers = c(Authorization = paste("token", "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")))
download.file(url = "https://raw.githubusercontent.com/eleonore-schneeg/OmixData/main/ensembl_mappings_human.tsv",destfile=ensembl_fp ,headers = c(Authorization = paste("token", "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")))
download.file(url = "https://raw.githubusercontent.com/eleonore-schneeg/OmixData/main/Enrichr_Queries.gmt.txt",destfile=TF_fp ,headers = c(Authorization = paste("token", "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")))
synapser::synLogin(email=Sys.getenv('synapse_id'), password=Sys.getenv('synapse_password'))## Welcome, ems2817!
## NULL
rawdata_rna <- read.csv(synapser::synGet('syn51533734')$path, header=T, stringsAsFactors = F, row.names=1)
rawdata_protein <- read.csv(synapser::synGet('syn51533735')$path, header=T, stringsAsFactors = F, row.names=1)
map_rna <-read.csv(synapser::synGet('syn51533731')$path, header=T, stringsAsFactors = F, row.names=1)
map_prot <- read.csv(synapser::synGet('syn51533730')$path, header=T, stringsAsFactors = F, row.names=1)
metadata_rna<-read.csv(synapser::synGet('syn51533733')$path, header=T, stringsAsFactors = F, row.names=1)
metadata_prot <- read.csv(synapser::synGet('syn51533732')$path, header=T, stringsAsFactors = F, row.names=1)
individual_metadata <-read.csv(synapser::synGet('syn51533736')$path, header=T, stringsAsFactors = F, row.names=1)
ctd <- readRDS(paste0(outputDir,'/ctd.rds'))
ensembl <-read.delim(file=paste0(outputDir,'/ensembl.rds'), sep = '\t', header = TRUE)
rna_qc_data_matrix <- NULLSanity check
## [1] TRUE
## [1] TRUE
rawdata_rna is a data-frame of raw counts, with features
as rows and samples as columns
print(rawdata_rna[1:5,1:5])## IGF117756 IGF117759 IGF117764 IGF117772 IGF117778
## ENSG00000223972 1 1 1 0 0
## ENSG00000227232 40 30 18 15 35
## ENSG00000278267 1 0 3 4 3
## ENSG00000243485 0 0 0 0 0
## ENSG00000284332 0 0 0 0 0
rawdata_protein is a data-frame of raw protein
abundances, with features as rows and samples as columns
print(rawdata_protein[15:20,15:20])## BBN_7540_MTG_P1WG4_001 BBN_7545_MTG_P1WG7_001
## TUBAL3 7158.6100 9201.4900
## FBLL1 67.4334 60.1956
## CASTOR2 17.3159 26.2499
## UNC119;UNC119B 20.3847 12.3418
## YJEFN3 NA 23.4309
## ARHGAP32 11.9989 17.0831
## BBN_7617_MTG_P1WH4_001 BBN_9984_MTG_P3WD1_001
## TUBAL3 7294.23000 8982.00000
## FBLL1 85.75520 62.94880
## CASTOR2 15.63710 19.52160
## UNC119;UNC119B 19.64770 8.60717
## YJEFN3 NA 25.40000
## ARHGAP32 5.57716 9.02670
## BBN002.26094_MTG_P3WC8_001 BBN002.29089_MTG_P3WD7_001
## TUBAL3 10871.4000 8487.21000
## FBLL1 46.9068 57.40500
## CASTOR2 NA 22.37250
## UNC119;UNC119B NA 17.10210
## YJEFN3 19.7786 16.63510
## ARHGAP32 NA 7.95878
Maps are data frame containing two columns: primary and
colname. The primary column should be the
individual ID the individual metadata, and the colname the
matched sample names from the raw matrices columns. If sample and
individual ids are the same, maps aren’t needed (primary and colnames
are the same).
## primary colname
## 1 BBN_10099-MTG BBN_10099_MTG_P3WH10_001
## 2 BBN_10109-MTG BBN_10109_MTG_P4WA3_001
## 3 BBN_10611-MTG BBN_10611_MTG_P2WD10_001
## 4 BBN_14801-MTG BBN_14801_MTG_P1WH9_001
## 5 BBN_16213-MTG BBN_16213_MTG_P2WH3_001
## 6 BBN_18399-MTG BBN_18399_MTG_P2WF5_001
## primary colname
## 12 BBN003.32235-MTG IGF117756
## 15 BBN_7617-MTG IGF117759
## 20 BBN_16213-MTG IGF117764
## 25 BBN_7519-MTG IGF117772
## 30 BBN_7540-MTG IGF117778
## 36 BBN003.32227-MTG IGF117784
Technical metadata are data-frames contain the column
colname that should match the sample names in the raw
matrices, and any additional columns related to technical artefacts like
batches
## sample_id sample_name RIN Amyloid_Beta pTau
## IGF117756 BBN003.32235-MTG IGF117756 2.7 2.027670 6.98931000
## IGF117759 BBN_7617-MTG IGF117759 2.0 3.262270 2.55134000
## IGF117764 BBN_16213-MTG IGF117764 4.6 0.147259 0.00247153
## IGF117772 BBN_7519-MTG IGF117772 7.5 NA NA
## IGF117778 BBN_7540-MTG IGF117778 4.8 3.828680 2.77941000
## IGF117784 BBN003.32227-MTG IGF117784 1.7 5.015050 2.94966000
## pct_pTau_Positive PHF1
## IGF117756 5.25975000 2.420650000
## IGF117759 0.37497600 1.564080000
## IGF117764 0.00876194 0.000519246
## IGF117772 NA NA
## IGF117778 2.56623000 NA
## IGF117784 0.71418700 2.409610000
## sample_id sample_name batch
## BBN_10099_MTG_P3WH10_001 BBN_10099-MTG BBN_10099_MTG_P3WH10_001 P3
## BBN_10109_MTG_P4WA3_001 BBN_10109-MTG BBN_10109_MTG_P4WA3_001 P4
## BBN_10611_MTG_P2WD10_001 BBN_10611-MTG BBN_10611_MTG_P2WD10_001 P2
## BBN_14801_MTG_P1WH9_001 BBN_14801-MTG BBN_14801_MTG_P1WH9_001 P1
## BBN_16213_MTG_P2WH3_001 BBN_16213-MTG BBN_16213_MTG_P2WH3_001 P2
## BBN_18399_MTG_P2WF5_001 BBN_18399-MTG BBN_18399_MTG_P2WF5_001 P2
individual_metadata should contain individual level
metadata, where one column matches the primary column in
maps.
## sample_id brain_region age trem2_all BBN_ID sex diagnosis apoe_final
## 1 BBN_10099-MTG MTG 91 CV BBN_10099 F Control E3/E3
## 2 BBN_10109-MTG MTG 80 CV BBN_10109 F Control E2/E3
## 3 BBN_10611-MTG MTG 92 CV BBN_10611 F AD E3/E3
## 4 BBN_14801-MTG MTG 65 CV BBN_14801 F Control E3/E3
## 5 BBN_16213-MTG MTG 73 CV BBN_16213 M Control E3/E3
## 6 BBN_18399-MTG MTG 77 CV BBN_18399 M Control E3/E3
## Braak PMD amyloid pTau PHF1 AD
## 1 2 22 0.5852850 0.00649899 0.001686760 0
## 2 2 23 0.7826840 0.92852500 0.678769000 0
## 3 3 30 0.3078820 0.01312750 0.017976000 1
## 4 1 47 NA NA NA 0
## 5 0 23 0.1472590 0.00247153 0.000519246 0
## 6 0 11 0.0455756 0.00222531 0.010463800 0
To run Omix, we first need to generate a multi-omics object The package currently supports transcriptomics and proteomics bulk data only.
# Convert metadata type from character to factor
individual_metadata$diagnosis <- factor(individual_metadata$diagnosis,
levels = c("Control", "AD"))
individual_metadata$sex <- factor(individual_metadata$sex)
#Generate multiassay object
multiomics_object_MTG=generate_multiassay(rawdata_rna =rawdata_rna,
rawdata_protein = rawdata_protein,
individual_to_sample=FALSE,
map_rna = map_rna,
map_protein = map_prot,
metadata_rna = metadata_rna,
metadata_protein = metadata_prot,
individual_metadata = individual_metadata,
map_by_column = 'sample_id',
rna_qc_data=FALSE,
rna_qc_data_matrix=NULL,
organism='human')## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## ✔ Ensembl ID conversion to gene symbol
## ✔ Retrieval of gene biotype
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## ✔ RNA raw data loaded
## class: SummarizedExperiment
## dim: 58884 29
## metadata(1): metadata
## assays(1): rna_raw
## rownames(58884): ENSG00000223972 ENSG00000227232 ... ENSG00000277475
## ENSG00000268674
## rowData names(3): ensembl_gene_id gene_name gene_biotype
## colnames(29): IGF117756 IGF117759 ... IGF117836 IGF117837
## colData names(7): sample_id sample_name ... pct_pTau_Positive PHF1
## ✔ Protein raw data loaded
## class: SummarizedExperiment
## dim: 3228 39
## metadata(1): metadata
## assays(1): protein_raw
## rownames(3228):
## IGKV2-28;IGKV2-29;IGKV2-30;IGKV2-40;IGKV2D-26;IGKV2D-28;IGKV2D-29;IGKV2D-30;IGKV2D-40
## IGKV3-11;IGKV3D-11 ... FAM169A SEC23IP
## rowData names(1): gene_name
## colnames(39): BBN_10099_MTG_P3WH10_001 BBN_10109_MTG_P4WA3_001 ...
## BBN003.35446_MTG_P2WD1_001 BBN003.35520_MTG_P2WD4_001
## colData names(3): sample_id sample_name batch
## ✔ MultiAssayExperiment object generated!
The MultiAssayExperiment object was succesfully created. The
following steps will process and perform QC on each omic layers of the
multiassay object.
multiomics_object_MTG=process_rna(multiassay=multiomics_object_MTG,
transformation='rlog',
protein_coding=TRUE,
min_count = 10,
min_sample = 0.5,
dependent = "diagnosis",
levels = c("Control","AD"),
covariates=c('age','sex','PMD'),
filter=TRUE,
batch_correction=TRUE,
batch=NULL,
remove_sample_outliers= TRUE)## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## the design formula contains one or more numeric variables with integer values,
## specifying a model with increasing fold change for higher values.
## did you mean for this to be a factor? if so, first convert
## this variable to a factor using the factor() function
## the design formula contains one or more numeric variables that have mean or
## standard deviation larger than 5 (an arbitrary threshold to trigger this message).
## Including numeric variables with large mean can induce collinearity with the intercept.
## Users should center and scale numeric variables in the design to improve GLM convergence.
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## ✔ NORMALISATION & TRANSFORMATION
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## ✔ GENE FILTERING
## ✔ Keeping only protein coding genes
## ✔ 39155 / 58884 non protein coding genes were dropped
## ✔ 19729 protein coding genes kept for analysis
## ✔ QC parameters selected require genes to have at least 50 % of samples with counts of 10 or more
## ✔ 4884 / 19729 genes were dropped
## ✔ 14845 genes kept for analysis
## ✔ RLOG TRANSFORMATION
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## ✔ BATCH CORRECTION
## ✔ Using Limma on to remove technical artefacts, and age as biological confoundersUsing Limma on to remove technical artefacts, and sex as biological confoundersUsing Limma on to remove technical artefacts, and PMD as biological confounders
## ✔ SAMPLE OUTLIERS REMOVAL
## ✔ 0 sample outliers out of 29 samples detected and dropped
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## ✔ Transcriptomics data processed!
## ✔ Processing parameters saved in metadata
multiomics_object_MTG=process_protein(
multiassay=multiomics_object_MTG,
filter=TRUE,
min_sample = 0.5,
dependent = "diagnosis",
levels = c("Control","AD"),
imputation = 'minimum_value',
remove_feature_outliers= TRUE,
batch_correction= TRUE,
batch="batch",
correction_method="median_centering",
remove_sample_outliers=TRUE,
denoise=TRUE,
covariates=c('PMD','sex','age'))## ✔ SCALING NORMALIZATION
## ✔ FILTERING
## ✔ 322 / 3228 proteins filtered
## ✔ 2906 proteins kept for analysis
## ✔ IMPUTATION
## ✔ Imputation of remaining missing values based on
## 50% of minimum level of abundance for each protein
## ✔ BATCH CORRECTION
## ✔ DENOISING BIOLOGICAL COVARIATES
## ✔ Using Limma on PMD as biological confoundersUsing Limma on sex as biological confoundersUsing Limma on age as biological confounders
## ✔ REMOVING FEATURE OUTLIERS
##
## ── Remove proteins with average protein abundance
## across samples in bottom and u
## ✔ 266 feature outliers out of 2906 features detected and dropped
## ✔ REMOVING SAMPLE OUTLIERS
## ✔ 0 sample outliers out of 39 samples detected and dropped
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
The processing parameter are stored in the object
Now we run a differential expression analysis between groups of
interest, where the first level, here control is the reference. The
function requires to run process_rna first, with covariates
of interest. These are adjusted for in the DE.
multiomics_object_MTG=rna_DE_analysis(multiassay=multiomics_object_MTG,
dependent='diagnosis',
levels=c('Control','AD'),
filter_protein_coding = TRUE,
log2FoldChange = 0.2)## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## using pre-existing size factors
## estimating dispersions
## gene-wise dispersion estimates
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## mean-dispersion relationship
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## final dispersion estimates
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## fitting model and testing
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Welcome to enrichR
## Checking connection ...
## Enrichr ... Connection is Live!
## FlyEnrichr ... Connection is available!
## WormEnrichr ... Connection is available!
## YeastEnrichr ... Connection is available!
## FishEnrichr ... Connection is available!
## Uploading data to Enrichr... Done.
## Querying GO_Molecular_Function_2021... Done.
## Querying GO_Cellular_Component_2021... Done.
## Querying GO_Biological_Process_2021... Done.
## Querying WikiPathways_2021_Human... Done.
## Querying Reactome_2016... Done.
## Querying KEGG_2021_Human... Done.
## Querying MSigDB_Hallmark_2020... Done.
## Querying BioCarta_2016... Done.
## Parsing results... Done.
## No significant impacted pathways found at FDR <= 0.05!
## Welcome to enrichR
## Checking connection ...
## Enrichr ... Connection is Live!
## FlyEnrichr ... Connection is available!
## WormEnrichr ... Connection is available!
## YeastEnrichr ... Connection is available!
## FishEnrichr ... Connection is available!
## Uploading data to Enrichr... Done.
## Querying GO_Molecular_Function_2021... Done.
## Querying GO_Cellular_Component_2021... Done.
## Querying GO_Biological_Process_2021... Done.
## Querying WikiPathways_2021_Human... Done.
## Querying Reactome_2016... Done.
## Querying KEGG_2021_Human... Done.
## Querying MSigDB_Hallmark_2020... Done.
## Querying BioCarta_2016... Done.
## Parsing results... Done.
## ℹ Output is returned as a list!
Differential expression results for transcripts are saved in the
multiomics_object_MTG@metadata$DEG slot
DT::datatable(multiomics_object_MTG@metadata$DEG$ADvsControl,
rownames = FALSE,
escape = FALSE,
options = list(pageLength = 20,
scrollX=T,
autoWidth = TRUE,
dom = 'Blfrtip'))Volcano plot of differentially expressed genes is saved in the
multiomics_object_MTG@metadata$DEG$plot slot
volcano_interactive(multiomics_object_MTG@metadata[["DEG"]]$ADvsControl, padj=0.05)Now we run a differential expression analysis between groups of
interest, where the first level, here control is the reference. If the
data is already denoised for covariates of interest in
process_protein, set covariates as `NULL``
multiomics_object_MTG=protein_DE_analysis(multiomics_object_MTG,
slot="protein_processed",
dependent='diagnosis',
covariates=NULL,
levels=c('Control','AD'),
log2FoldChange = 0.2)## Uploading data to Enrichr... Done.
## Querying GO_Molecular_Function_2021... Done.
## Querying GO_Cellular_Component_2021... Done.
## Querying GO_Biological_Process_2021... Done.
## Querying WikiPathways_2021_Human... Done.
## Querying Reactome_2016... Done.
## Querying KEGG_2021_Human... Done.
## Querying MSigDB_Hallmark_2020... Done.
## Querying BioCarta_2016... Done.
## Parsing results... Done.
## Uploading data to Enrichr... Done.
## Querying GO_Molecular_Function_2021... Done.
## Querying GO_Cellular_Component_2021... Done.
## Querying GO_Biological_Process_2021... Done.
## Querying WikiPathways_2021_Human... Done.
## Querying Reactome_2016... Done.
## Querying KEGG_2021_Human... Done.
## Querying MSigDB_Hallmark_2020... Done.
## Querying BioCarta_2016... Done.
## Parsing results... Done.
Differential expression results for proteins are saved in the
multiomics_object_MTG@metadata$DEP slot
DT::datatable(multiomics_object_MTG@metadata$DEP$ADvsControl,
rownames = FALSE,
escape = FALSE,
options = list(pageLength = 20,
scrollX=T,
autoWidth = TRUE,
dom = 'Blfrtip'))Volcano plot of differentially abundant proteins is saved in the
multiomics_object_MTG@metadata$DEP$plot slot
volcano_interactive(multiomics_object_MTG@metadata[["DEP"]]$ADvsControl, padj=0.05)Since the magnitude of changes are not directly comparable between platforms, but their directionality are, we assessed whether significant differential features overlapped between the two platforms and whether their direction of effects were concordant or discordant.
multiomics_object_MTG=single_omic_comparisons(multiassay=multiomics_object_MTG,
slot = "ADvsControl",
threshold = 0.05,
pvalue = 'pvalue',
filtering_options=NULL,
additional_database_gmt=GO_fp)
comparison=multiomics_object_MTG@metadata$single_omic_comparison$ADvsControl
volcano_interactive_comparison(comparison$dataframe)
corr= cor(comparison$dataframe$log2FoldChange_transcriptomics,
comparison$dataframe$log2FoldChange_proteomics)We found a r=0.2318711 correlation between pairwise log2FoldChanges (p-value <0.05) indicating a moderate positive correlation between the mRNA and protein level changes, which is reinforced by the existence of discordant pairs.
DT::datatable(comparison$dataframe,
rownames = FALSE,
escape = FALSE,
options = list(pageLength = 20,
scrollX=T,
autoWidth = TRUE,
dom = 'Blfrtip'))Overall, these single platform analyses highlighted obvious differences between transcriptomics and proteomics and should be further studied to differentiate technical versus biological effects. Overall, results highlight that:
Protein with a positive Log2FC but negative mRNA regulation may point to those accumulating in the brain as AD progresses
comparison$dataframe$gene_name[which(comparison$dataframe$direction=='Discordant'& comparison$dataframe$log2FoldChange_proteomics>0.2)]## [1] "ACAN" "ACAT2" "ACOT8" "ACSL6" "ADAM10" "AKR1A1"
## [7] "AKR7A2" "ALDH9A1" "ALMS1" "ANK1" "ANP32A" "AP3M1"
## [13] "APOE" "APRT" "ARCN1" "ARF1" "ARF4" "ARL1"
## [19] "ARL3" "ARMC8" "ASS1" "ATP1B2" "ATP2B3" "ATP2B4"
## [25] "ATP8A1" "ATP9A" "BAG5" "BANF1" "BCL2L13" "BLOC1S2"
## [31] "BLVRB" "C1QA" "C1QC" "CACNA2D2" "CALR" "CAMK2G"
## [37] "CAMSAP2" "CAPNS1" "CBX3" "CCDC43" "CCDC91" "CCS"
## [43] "CHMP2B" "CHMP7" "CHORDC1" "CLIC4" "CNN3" "COMMD10"
## [49] "COMMD2" "COPB1" "COPG1" "COPS6" "CPE" "CPNE3"
## [55] "CPNE6" "DDOST" "DHX15" "DHX9" "DMD" "DNPEP"
## [61] "DRG1" "DYNC1I2" "EIF1" "EIF2B3" "EIF2B4" "EIF3D"
## [67] "EIF3E" "EIF4A3" "EMD" "EPM2AIP1" "ESD" "FBXL16"
## [73] "FDPS" "FHL1" "FKBP4" "FN3KRP" "FNTA" "FSD1L"
## [79] "GAK" "GBA" "GCA" "GLO1" "GNAQ" "GNAS"
## [85] "GSPT1" "GSTM5" "GYS1" "HBS1L" "HMGB1" "HNRNPK"
## [91] "HOOK3" "HSDL1" "HSPB6" "IFT27" "INPP1" "INPP5A"
## [97] "ISOC1" "ITPR1" "KCNA2" "LAMTOR2" "LCP1" "LGALS3"
## [103] "LIMS1" "LIN7C" "LMAN1" "LSM3" "LTA4H" "LUC7L2"
## [109] "LYSMD2" "MAOB" "MAPK1" "MARCKS" "MPC2" "MTCH1"
## [115] "MTCH2" "NAMPT" "NANS" "NDRG4" "NEFH" "NF1"
## [121] "NLGN2" "NOP56" "NPTX1" "NQO1" "NT5C1A" "OTUD6B"
## [127] "PAFAH1B1" "PCYOX1" "PELO" "PGD" "PHPT1" "PIK3R4"
## [133] "PITRM1" "PLAA" "PLCD1" "PLCXD3" "PLD3" "PPIL1"
## [139] "PPT1" "PRAF2" "PREP" "PREPL" "PRPSAP1" "PSMC4"
## [145] "PSMD10" "PSME2" "RAB39B" "RALYL" "RANBP9" "RANGAP1"
## [151] "RBBP9" "RBP1" "RCN1" "REPS1" "RMDN3" "RNPS1"
## [157] "RPN1" "RSU1" "RYR2" "S100A1" "S100A16" "SACM1L"
## [163] "SAE1" "SART3" "SEC13" "SEC22B" "SEC24C" "SERPINH1"
## [169] "SETD7" "SIRT3" "SKP1" "SLC1A3" "SLC2A3" "SLC3A2"
## [175] "SLC4A1" "SLC4A4" "SLC7A14" "SLC8A1" "SMS" "SNRNP200"
## [181] "SNRNP40" "SNRPA1" "SNTB2" "SON" "SQSTM1" "SRI"
## [187] "SRP54" "SRP72" "SRSF1" "SRSF2" "SRSF7" "SSB"
## [193] "STAM2" "SURF4" "SYP" "TBCB" "TMA7" "TMEM11"
## [199] "TMEM30A" "TMEM65" "TMX3" "TOP1" "TPP1" "TRA2A"
## [205] "TRAPPC10" "TSN" "TSNAX" "TSR2" "TXNRD1" "UBA1"
## [211] "UBAC1" "UFC1" "UFL1" "VCL" "VPS18" "VPS25"
## [217] "VPS29" "VTN" "WDR1"
background=get_background(multiomics_object_MTG)
scores_group=comparison$dataframe[,c('gene_name','pvalue_transcriptomics','pvalue_proteomics')]
scores_group=scores_group[!duplicated(scores_group$gene_name), ]
rownames(scores_group)=scores_group$gene_name
scores_group=scores_group[,2:3]
colnames(scores_group)=c('Transcriptomics','Proteomics')
scores_group=as.matrix(scores_group)
ActivePathways=ActivePathways::ActivePathways(scores_group,
gmt=GO_fp,
background = background,
cutoff=0.2)
DT::datatable(ActivePathways,
rownames = FALSE,
escape = FALSE,
options = list(pageLength = 20,
scrollX=T,
autoWidth = TRUE,
dom = 'Blfrtip'))multiGSEA works with nested lists where each sublist represents an omics layer. The function rankFeatures calculates the pre-ranked features, that are needed for the subsequent calculation of the enrichment score. rankFeatures calculates the a local statistic ls based on the direction of the fold change and the magnitude of its significance
By default, the vertical integration is performed on processed slots.
DIABLO is a supervised multi-omics integration model
that requires the input of a dependent variable. The function will
perform tuning on the defined range, and number of components
chosen.
multiomics_object_MTG=vertical_integration(multiomics_object_MTG,
slots = c(
"rna_processed",
"protein_processed"
),
integration='DIABLO',
intersect_genes=FALSE,
ID_type='gene_name',
dependent='diagnosis',
levels= c("Control", "AD"),
design="cor",
ncomp=2,
range=list(mRNA=seq(100,1000,by=100),
proteins=seq(20,1000,by=100)))## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## harmonizing input:
## removing 68 sampleMap rows not in names(experiments)
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
## Found more than one class "DataFrame" in cache; using the first, from namespace 'S4Vectors'
## Also defined by 'PythonEmbedInR'
##
## ── MODEL TUNING ──
##
## Design matrix has changed to include Y; each block will be
## linked to Y.
##
## You have provided a sequence of keepX of length: 10 for block mRNA and 10 for block proteins.
## This results in 100 models being fitted for each component and each nrepeat, this may take some time to run, be patient!
##
## You can look into the 'BPPARAM' argument to speed up computation time.
##
## tuning component 1
##
|
| | 0%
|
|============== | 20%
|
|============================ | 40%
|
|========================================== | 60%
|
|======================================================== | 80%
|
|======================================================================| 100%
## tuning component 2
##
|
| | 0%
|
|============== | 20%
|
|============================ | 40%
|
|========================================== | 60%
|
|======================================================== | 80%
|
|======================================================================| 100%
## Design matrix has changed to include Y; each block will be
## linked to Y.
## ✔ VERTICAL INTEGRATION WITH DIABLO
The user may want to perform the integrative downstream analyses
step-by-step, or to run integrative_results_supervised,
which will create a results object containing, multi-omics signatures,
network visualisation, cell type and functional enrichment, as well as
comparison of signatures against the OpenTargets knowledge base.
multiomics_object_MTG=integrative_results_supervised(multiassay=multiomics_object_MTG,
integration_model = "DIABLO",
dependent = "diagnosis",
component=1,
sense_check_variable = "PHF1",
covariates = c("PHF1", "amyloid", "pTau","AD","Braak"),
correlation_threshold = 0.4,
weights_threshold = 0.5,
community_detection = "leading_eigen",
disease_id = "MONDO_0004975",
TF_fp=TF_fp)## Uploading data to Enrichr... Done.
## Querying GO_Molecular_Function_2021... Done.
## Querying GO_Cellular_Component_2021... Done.
## Querying GO_Biological_Process_2021... Done.
## Querying WikiPathways_2021_Human... Done.
## Querying Reactome_2016... Done.
## Querying KEGG_2021_Human... Done.
## Querying MSigDB_Hallmark_2020... Done.
## Querying BioCarta_2016... Done.
## Parsing results... Done.
## Uploading data to Enrichr... Done.
## Querying GO_Molecular_Function_2021... Done.
## Querying GO_Cellular_Component_2021... Done.
## Querying GO_Biological_Process_2021... Done.
## Querying WikiPathways_2021_Human... Done.
## Querying Reactome_2016... Done.
## Querying KEGG_2021_Human... Done.
## Querying MSigDB_Hallmark_2020... Done.
## Querying BioCarta_2016... Done.
## Parsing results... Done.
## Uploading data to Enrichr... Done.
## Querying GO_Molecular_Function_2021... Done.
## Querying GO_Cellular_Component_2021... Done.
## Querying GO_Biological_Process_2021... Done.
## Querying WikiPathways_2021_Human... Done.
## Querying Reactome_2016... Done.
## Querying KEGG_2021_Human... Done.
## Querying MSigDB_Hallmark_2020... Done.
## Querying BioCarta_2016... Done.
## Parsing results... Done.


